Virtual KITTI NextGen: Procedural Environment Reconstruction for Test Coverage Expansion of Connected and Autonomous Vehicles
In 2025 IEEE International Automated Vehicle Validation Conference (IEEE IAVVC 2025), 2025.
Abstract
Perceptual fidelity plays a critical role in synthetic imagery for closing the sim-to-real gap in computer vision. The hypothesis that perceptual similarity between artificial and real images can be quantified using learned metrics such as LPIPS predicts downstream ML performance. In this paper, we support this hypothesis by presenting a method for recreating high-fidelity traffic scenes grounded in real-world data and comparing these scenes with real ones. We reconstruct scenes from the KITTI dataset using open geographic data sources, including OpenStreetMap and federal geospatial repositories. The environment layout, static infrastructure, and camera trajectories are derived from real-world constraints, while time-of-day and lighting conditions are matched through solar position calculations based on KITTI timestamps and GPS data. These scenes are rendered in Unreal Engine 5 using physically based materials and real-time global illumination (Lumen). Our approach achieves LPIPS scores up to two times lower compared to Virtual KITTI v2, indicating stronger perceptual alignment with the original dataset. In addition, we evaluate semantic segmentation performance, observing that models performing on our synthetic scenes exhibit smaller domain-induced degradation. These results substantiate the use of perceptual metrics as proxies for model transferability and underscore the importance of realism-focused simulation in bridging the sim-to-real domain gap.